Introduction to Python is brought to you by the Centre for the Analysis of Genome Evolution & Function (CAGEF) bioinformatics training initiative. This course was developed based on feedback on the needs and interests of the Department of Cell & Systems Biology and the Department of Ecology and Evolutionary Biology.
The structure of this course is a code-along style; It is 100% hands on! A few hours prior to each lecture, the materials will be avaialable for download at QUERCUS and also distributed via email. The teaching materials will consist of a Jupyter Lab Notebook with concepts, comments, instructions, and blank spaces that you will fill out with Python code along with the instructor. Other teaching materials include an HTML version of the notebook, and datasets to import into Python - when required. This learning approach will allow you to spend the time coding and not taking notes!
As we go along, there will be some in-class challenge questions for you to solve either individually or in cooperation with your peers. Post lecture assessments will also be available (see syllabus for grading scheme and percentages of the final mark).
We'll take a blank slate approach here to Python and assume that you pretty much know nothing about programming. From the beginning of this course to the end, we want to get you from some potential scenarios:
A pile of data (like an excel file or tab-separated file) full of experimental observations and you don't know what to do with it.
Maybe you're manipulating large tables all in excel, making custom formulas and pivot table with graphs. Now you have to repeat similar experiments and do the analysis again.
You're generating high-throughput data and there aren't any bioinformaticians around to help you sort it out.
You heard about Python and what it could do for your data analysis but don't know what that means or where to start.
and get you to a point where you can:
Format your data correctly for analysis
Produce basic plots and perform exploratory analysis
Make functions and scripts for re-analysing existing or new data sets
Track your experiments in a digital notebook like Jupyter!
This week will be your crash-course on Jupyter notebooks and the basic principles of Python. This is the beginning of a 7-week journey so we'll ease into things by introducing the format, and going over the very basics of Python to start.
At the end of this lecture we will have covered the following topics:
grey background - a package, function, code, command or directory. Backticks are also use for in-line code.
italics - an important term or concept or an individual file or folder
bold - heading or a term that is being defined
blue text - named or unnamed hyperlink
... - Within each coding cell this will indicate an area of code that students will need to complete for the code cell to run correctly.
Python is an open source, general purpose programming language created in 1989 by Dutch programmer Guido Von Rossum, and first released in 1991. From its conception, Python was meant to be elegant, powerful, and efficient, while easy to use by non-professional programmers. Python’s name comes from “Monty Python’s Flying Circus”, a popular British comedy back in the 1980’s, and references to the television show are often found among Python tutorials, manuals, and help sources (words like “spam” and “eggs” are references to Monty Python).
In addition to user friendliness, elegance and simplicity, the open-source nature of Python has allowed a community of thousands of developers to participate in the evolution of the language, making it one of the most popular coding languages and the lingua-franca for scripting in bioinformatics. Python is currently one of the most sought-after skills in data science, software development, and pipeline engineering
As mentioned, Python is relatively simple to write, read, and therefore interpret, even by people that are not necessarily programming savvy; a contrasting difference with languages such as C and C++, that are difficult to read and write. Simplicity and efficiency are at the core of Python’s popularity, as users tend to spend more time writing code than dealing with complex grammar, syntax, or debugging (troubleshooting) code. These same characteristics have made Python very popular among data scientists.
Python is a high-level, object-oriented programming (OOP) language, meaning that:
In contrast, functional programming languages such as Fortran and C require users to often write complicated, long, and detailed sets of instructions so the computer knows exactly what to do, making those languages hard to write, read, and debug. (By the way, Python is mostly written in C!).
The materials for this course have been prepared to work with Python 3.8.12 which runs off the University of Toronto JupyterHub.
Work with your Jupyter Notebook on JupyterHub will all be contained within a new browser tab with the address bar showing something similar to
https://jupyter.utoronto.ca/user/assigned-username-hexadecimal/tree/2022-01.IntroPython
All of this is running remotely on a University of Toronto server rather than your own machine.
You'll see a directory structure from your home folder:
ie \2022-01.IntroPython\ and a folder to Lecture_01_Intro_Python within. Clicking on that, you'll find Lecture_01.Intro.Python.skeleton.ipynb which is the notebook we will use for today's code-along.
Each week our JupyterHub directory for the course will be updated with a new lecture and materials. Prior to class you should update your JupyterHub version of the class repository by clicking on this JupyterHub updater link. Note that it's the same link every week used to update the lectures.
We'll also provide this link at the beginning of each class as a reminder along with a link to a live-updating version of the skeleton. As each lecture progresses, this HTML file will refresh with the code that we are inserting into various parts of our lecture. This way, if you get lost, you have a chance to catch up.
At the end of each class, a complete version of the lecture will be released on Quercus as a PDF for you to download so don't worry too much if you fall behind. We'll try to make sections of each lecture relatively independent so you can progress at points even if you do get lost.
We've implemented the class this way to reduce the burden of having to install various programs. While installation can be a little tricky, it's really not that bad. For this introduction course, however, you don't need to go through all of that just to learn the basics of coding.
Jupyter Notebooks also give us the option of inserting "markdown" text much like what you're reading at this very exact moment. So we can intersperse ideas and information between our learning code blocks.
There is, however an appendix section at the end of this lecture detailing how to install Jupyter Notebooks using the Anaconda environment. See section 7.0.0 for more information.
There are several alternatives to run Python code in your computer, either as a single instruction or a set of several instructions called scripts.
| Example image of the University of Toronto JupyterHub terminal |
Perhaps the easiest and quickest way to start using Python is the interactive command line, also called interactive prompt, in a terminal. To try this option, open a terminal from your Jupyter Lab explorer under New > Terminal and type python. You should see three "greater than" symbols: >>>; they mean that Python is ready and waiting for instructions. Type (5 + 6) * 4 and run that command (enter) to see the result. Once you are ready to exit, type exit() to get out of the interactive prompt.
The interactive prompt allows you to see the output of your Python code immediately. It is a quick way to do some statistical calculations or make a scatter plot. A major disadvantage of coding at the interactive prompt, though, is that the code you write will not be saved; there is no such a thing as a "save as" option. Your code runs and is executed, yes, but all those instructions that you wrote, if you want to save them, need to be manually copied/pasted to a text processor such as Notepad, Vim, or Nano. This is a major drawback as we tend to reuse the code we write. Additionally, a key concept in science is reproducibility, and what better way to make your analyses reproducible than saving a detailed list of the commands that you used to generate those results? Do not worry, however, because there are workarounds to overcome this limitation of interactive prompts, without tedious and error-prone copy/paste methods. Here is where Integrated Development Environments (IDEs) like Jupyter shine. Now, let us dive a bit more into IDEs.
Integrated development environments are tools that assist in code development and they are available for almost every programming language. Many IDEs actually offer multilanguage support. IDEs offer Graphic User Interfaces (GUIs) that let you develop, edit, and browse Python programs, all from a single window. In addition to simplifying the task of saving the code that you write, IDEs also simplify the process of writing and debugging code (this last part is a huge advantage! In the next few lectures you will find out why). Examples of IDEs that support Python include Atom, Spyder, Sublime, notepad++,and Jupyter Lab.
| The Anaconda Navigator provides access to many IDEs in a single interface |
We will use Jupyter Notebook in this course, and it shares a lot of base features with Jupyter Lab which automatically uses "syntax highlighting". It simply means that, as you write your code, the text editor starts pointing out issues that can break your code, such as misplaced commas, an open parenthesis that was not closed, or indentation issues (the latter is a big deal in Python). The main advantage of Jupyter Lab over other Python text editors is its support for markup capabilities via Markdown, a functionality that allows you to combine Python code with text to generate documents in various formats (e.g. .doc, .pdf, and HTML). For instance, the document that you are reading right now was initially created in Jupyter Lab using its Markdown functionality. To read more about Jupyter Lab's capabilities click here
| Example image of a Jupyter Lab interface which can host multiple tabs for multiple projects |
A third alternative to run python code is using scripts. A script is simply a set of instructions written in Python and saved as a Python file, very easy to recognize because they have the extension .py (like in file_name.py). If you open the file, the first line usually looks something like:
#!/usr/local/bin/python or #!/usr/bin/env python
Those two pieces of information - the extension and first line - let the computer know that the code in that file is a set of instructions that are written in Python and need to be interpreted as such. Those commands will be executed sequentially in their order of appearance unless otherwise indicated.
Scripts are a great resource to run multi-step programs that can take several hours to run, as you do not need to go back to your computer every few hours to run the next command. The script will take care of that for you.
Behind the scenes of each Jupyter notebook a programming kernel is running. For instance, depending on setup, our notebooks run a Python kernel to interpret each code cell as if it were written specifically for the Python language.
As we move from code cell to new code cell, all of the variables or objects we have created are stored within memory. We can refer to these as we run the code and move forward but if you overwrite or change them by mistake, you may to have rerun multiple cell blocks!
There are some options in the "Cell" menu that can alleviate these problems such as "Run All Above". If you think you've made a big error by overwriting a key object, you can use that option to "re-initialize" all of your previous code!
The run order of your code is also visible at the side of each code cell as [x]. When a code cell is still actively running it will be denoted as [*] since a number cannot be assigned to it. You'll also notice your kernel (top right of the menu bar) has a small circle that will be dark while running, and clear while idle.
Your Jupyter notebook will run in two modes most of the time. You'll either be editing a cell by adding code or text to it, or you'll be in "Command Mode" where you can run or edit the properties of cells themselves. Below are some helpful shortcuts keys for navigating the command mode.
Remember these friendly keys/shortcuts:
Esc to enter "Command Mode" which basically takes you outside of the cell.Enter to edit a cell.Arrow keys to navigate up and down (and within a cell).Ctrl+Enter to run a cell (both code and markdown).Shift+Enter to run the current cell and move to the next one below.Ctrl+/ to quickly comment and uncomment single or multiple lines of code.Tab can be used while coding to autocomplete variable, function and file names, and even look at a list of possible parameters for functions. Shift + L to toggle line numbers within markdown and code cells.In Command mode
a insert a new cell above the currently selected cell.b insert a new cell below the currently selected cell.
** Note that cells are defaulted to code cells.m converts a section to a markdown cell.y converts a section to a code cell.r converts a section to a raw__ nbconvert cell. This is most helpful when wishing to preserve a code format without running it through the kernel. If you can't remember your keyboard shortcuts, you can always access the same commands (and more!) through the menu bar.
1kernels are separate processes to run code that are started by the server that runs your code.
Depending on your needs, you may find yourself doing the following:
Jupyter allows you to alternate between "markdown" notes and "code" that can be run or re-run on the fly.
Each data run and it's results can be saved individually as a new notebook or as new cells to compare data and small changes to analyses!
Markdown is a markup language that lets you write HTML and Java Script code in combination with other languages. This allows you to make html, pdf, and text documents that are combinations of text and code, enhancing reproducibility, a key aspect in scientific work. Having everything in a single place also boosts productivity during results interpretation - no need to go back and forth between tabs, pages, and documents. They can all be integrated in a single document, allowing for a more fluid narrative of the story that you are communicating to your audience (less distractions for you!). For example, the lines of code below and the text you are reading right now were created in Jupyter Lab's Markdown. (Do not worry about the Python code just yet. We will get there sooner than you think).
As mentioned, Jupyter also allows you to write in LaTeX, a document preparation system to write mathematical notation. All it takes is to wrap LaTeX code between single dollar signs (\$) for inline notation or two double dollar signs (\\$\$), one at the beginning of the equation and one at the end. For example, the equation ***Yi = beta0 + beta1 xi + epsilon_i, i=1, ..., N*** can be transformed into LaTeX code by adding some characters: ***Y_i = \beta_0 + \beta_1 x_i + \varepsilon_i, i=1, \dots, N***. Now, if we use \\$\$ before and after the LaTeX code, this is what we get:
$$ Y_i = \beta_0 + \beta_1 x_i + \varepsilon_i, i=1, \dots,N $$See? Just like that!
We can make plots too:
# !pip install matplotlib
import pylab
t = pylab.arange(0.0, 2.0, 0.01)
s = pylab.sin(2.5*3.14*t)
pylab.plot(t, s)
pylab.xlabel('time (s)')
pylab.ylabel('voltage (mV)')
pylab.title('Sine Wave')
pylab.grid(True)
pylab.show()
import random
random.seed(113)
samples = 1000
dice = []
for i in range(samples):
total = random.randint(1,6) + random.randint(1,6)
dice.append(total)
pylab.hist(dice, bins= pylab.arange(1.5,12.6,1.0))
pylab.show()
Here is an example of a table made in Markdown, showing some of the most popular Python libraries for data science:
| Library | Use |
|---|---|
| Pandas | Tabular-data processing |
| NumPy | Mathematical functions and objects manipulation |
| Matplotlib | Data visualization |
| SciPy | Statistical applications |
These are just a few examples of what you can do with Jupyter and Markdown. To find out more on how to get the best of Markdown, click here.
Once you are finished writing your Python code and interpreting those results in a Jupyter notebook, you can render the notebook into pdf, html, and many other formats. There are several ways to achieve this. The easiest option is to go to File > Download as, and select the format that you want for your output. You will be asked if you want to see the output or save it.
Jupyter notebooks can also be rendered via the terminal by running jupyter nbconvert --to html my_notebook.ipynb, where my_notebook.ipynb is the name of the Jupyter notebook to be converted into HTML. The new HTML file will have the same name of the original file but with the extension html. If you prefer your document as a pdf, replace html with pdf in the code above and that will do: jupyter nbconvert --to pdf my_notebook.ipynb.
So far, we covered what Python is, why it is very popular, and became familiar with Jupyter Notebook, including its markdown capabilities. We are now ready to start coding in Python.
| Credit: https://www.testbytes.net/blog/programming-memes/ |
As in any other field, best practices have evolved over the years, and work as a set of standards that make processes more efficient and safer. In programming, these sets of standards are rather informal, unless the company where you work dictates otherwise, but are also intended to help generate better code and software. Here are some highlights about coding best practices.
# symbol¶Why bother?
Your worst collaborator is potentially you in 6 days or 6 months. Do you remember what you had for breakfast last Tuesday?
You can annotate your code for selfish reasons, or altruistic reasons, but please take the time to annotate your code.
How do I start?
It is, in general, part of best coding practices to keep things tidy and organized.
A hash-tag # will comment your text. Inside a code cell in a Jupyter Notebook or anywhere in a Python script, all text after a hashtag will be ignored by Python and by many other programming languages. It's very useful to add comments about changes in your code, as well as detailed explanations about your scripts.
Put a description of what you are doing near your code at every process, decision point, or non-default argument in a function. For example, why you selected k=6 for an analysis, or the Spearman over Pearson option for your correlation matrix, or quantile over median normalization, or why you made the decision to filter out certain samples.
Break your code into sections to make it readable. Scripts are just a series of steps and major steps should be titled/outlined with your reasoning - much like when presenting your research.
Give your objects informative object names that are not the same as function names.
Comments may/should appear in three places:
#---------- Here's a section of code with a specific function ----------#
# Python knows this is a comment (notice the hash-tag).
# Comments can be added:
# At the beginning of the script, describe the purpose of your script and what you are trying to solve/achieve.
5 + 4 *6 - 0 #In line: Describing a part of your code that is not obvious what it is for.
29
Keep in mind that comments are useful in Python code that you can/want to save. If you are working at the command prompt, neither your code nor your comments will be saved.
Consistency in naming conventions will allow you to recognize at first glance functions and variables in your own code.
Stylistically, you have the following options:
The most important aspects of naming conventions are being concise and consistent! Throughout this course you'll most often see the underscore-separated method to name variables.
When we make calculations in each code cell they produce a single output. This applies even if we produce multiple code lines that each, on their own, would generate an output. In a terminal you would be generating each call in a separate line and executing it. However, in Jupyter, the Python interpreter only generates human-readable output for the last command call. For the moment, to remedy this, we will use the print() function to help see the output from multiple calls.
# Compare the output of this code cell to the next!
5+6
10 + 2
12
print(5 + 6)
print(10 + 2)
# Look we can use the ";" to perform multiple actions on the same line
print(5+6); print(10 +2)
# The print function can even print multiple outputs in the same call!
print(5+6, 10+2)
11 12 11 12 11 12
Perhaps a disadvantage of this method is that the output is printed as a single line, which can sometimes make it harder to identify which output belongs to each command. Not a big deal for us right now.
A variable is simply a name that refers to a value, where a value can be a letter, a number, or combinations of both. Variables allow you to store objects (remember that Python is object-oriented so everything in Python is an object), which makes your life easier because you do not need to re-write numbers and results, or even copy-paste them every time you need to use them in your code. By simply typing the variable that you created, you can have access to the results of your analyses or to the data that you imported.
If you want to see the variable you just created, you need to explicitly use the print() function. The rules about variables in Python include:
A-Z, a-z or _ (underscore).. (period), @, $, or %.variable_name and Variable_Name are completely different to the interpreter.They are called "reserved" because they already mean something in Python, and it will throw an error if you try to use those words to name variables. The reserved words are and, del, from, None, True, as, elif, global, nonlocal, try, assert, else, if, not, while, break, except, import, or, with, class, False, in, pass, yield, continue, finally, is, raise, async, def, for, lambda, return, and await.
Note that this list of reserved words can change with different versions of python so the easiest way is to ask the system about them instead. Let's use our another function help() while we're at it. We'll talk more about the help() function later in this lecture.
# Use help to find information like "keywords"
help("keywords")
Here is a list of the Python keywords. Enter any keyword to get more help. False class from or None continue global pass True def if raise and del import return as elif in try assert else is while async except lambda with await finally nonlocal yield break for not
Depending how many objects are assigned at the time, variables can be classified as single- or multiple-assignment. Note that assignment of variables does not produce any output.
Only one variable is assigned at the time, as in the following example:
# Assign your first variable here
variable_a = 5
print(variable_a)
# Assign a string to a variable
variable_b = "python: snake or software or hilarious"
print(variable_b)
5 python: snake or software or hilarious
When you try to use the reserverd word if to name a variable, Python will throw an invalid syntax error:
# Attempt to assign a keyword to a variable
if = 5
File "<ipython-input-10-c6b8cca444d9>", line 3 if = 5 ^ SyntaxError: invalid syntax
Reserved words are easy to spot thanks to Jupyter's syntax highlight, as they will have a different color than the rest of your Python code.
Assigning variables to numbers, or naming a variable starting with a number, will also throw an error:
# Name a variable using a number
5 = "This will not work"
File "<ipython-input-11-c6954098b755>", line 2 5 = "This will not work" ^ SyntaxError: cannot assign to literal
# Name a variable starting with a number. Look at the syntax highlighting!
5again = "This will also not work"
again5 = "this will"
Python can also assign several variables simultaneously. Use commas to separate the different variables to be created (left side of the equal sign) and their values (right side).
# Assign two values to two variables. Note that quotes change the object type!
multiple_assignment_1, multiple_assignment_2 = 5, "6"
# Print your newly assigned variables
print(multiple_assignment_1)
print(multiple_assignment_2)
5 6
# Fix this code so we can see the type for multiple_assignment_2
type(multiple_assingmetn_2)
| Functions perform all the heavy lifting for you, making your life easier! |
We already used the functions: print(), help(), and type(). Functions are at the core of Python's capabilities and power. They are sets of instructions that Python recognizes and executes if no errors are detected. For example, there is already a function to calculate the standard deviation so you do not need to define/create that function if you need it; you just use it. It is that simple.
As more and more functions are created by developers, these functions are better off kept in Python scripts (remember those files with the extension .py?). Each one of those scripts is called a module. Here is the definition of modules from Python's documentation:
Simple, right? Now, things can get messy when we have too many python modules hanging around as individual entities; that is why modules are also organized in bigger collections called packages or libraries.
Packages are a way of structuring Python’s module namespace by using “dotted module names”. Packages exist to help organize modules and provide a naming hierarchy, which in turns keeps things organized in the same way that directories and subdirectories help you to organize files in your computer. The concept of libraries is what has made programming languages such as Python and R very popular, as any person can build packages and submit them for public use. This has allowed the participation of people from all disciplines in the development of packages relevant to their fields.
In order to use packages, they need to be installed by the user, except for those packages that are installed by default (built-in) when Python is installed. To see a list of all packages already installed in your computer, run !pydoc3 modules in Jupyter or pidoc3 modules in a terminal.
In order to access the functions in a library, it needs first to be installed, a task that can be achieved from within Jupyter with the command:
!pip install package_name (pip is a package manager for Python).
Here is your first task: Install matplotlib from by running !pip install matplotlib from a code cell.
# Install the matplotlib package
!pip install matplotlib
Requirement already satisfied: matplotlib in c:\users\mokca\anaconda3\lib\site-packages (3.3.4) Requirement already satisfied: kiwisolver>=1.0.1 in c:\users\mokca\anaconda3\lib\site-packages (from matplotlib) (1.3.1) Requirement already satisfied: pillow>=6.2.0 in c:\users\mokca\anaconda3\lib\site-packages (from matplotlib) (8.2.0) Requirement already satisfied: cycler>=0.10 in c:\users\mokca\anaconda3\lib\site-packages (from matplotlib) (0.10.0) Requirement already satisfied: python-dateutil>=2.1 in c:\users\mokca\anaconda3\lib\site-packages (from matplotlib) (2.8.1) Requirement already satisfied: numpy>=1.15 in c:\users\mokca\anaconda3\lib\site-packages (from matplotlib) (1.19.2) Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 in c:\users\mokca\anaconda3\lib\site-packages (from matplotlib) (2.4.7) Requirement already satisfied: six in c:\users\mokca\anaconda3\lib\site-packages (from cycler>=0.10->matplotlib) (1.15.0)
Sometimes, installation packages are not a straight forward process. Above you can see that there are a number of requirements that were also checked. Some packages may be dependent on the installation of other packages. In some cases this can lead to some conflict so be prepared for those battles with your computer. Things will get easier as your troubleshooting skills improve.
In addition to the functions/modules/packages that you can install as add-ons, Python also has built-in functions. Some of those built-in functions perform basic mathematical operations such as sum(), min(), and max(); others are meant to be used by much more advanced users. For a complete list of built-in functions and what they do, click here.
import¶If it all goes well, the installation of a package is a one-time event (except on JupyterHub). Once packages are successfully installed, they must to be imported (loaded) into the global environment to use the modules and functions they contain. To import a given package, run the command import followed by the name of the package. Let us import matplotlib, a popular library for data visualization:
# !pip install matplotlib
import matplotlib
Another option to import a package is using an shorter version of the packages name, called an alias:
import module_name as alias; aliases are optional but very handy when modules have long names (you will see why). The matplotlib library can then be imported as mp (or any alias you want!)
# import matplotlib and then give it an alias
import matplotlib as mp
Things are starting to get a bit confusing with all these concepts and definitions so here is a brief recap: functions are the pieces of code that actually do the job in Python; they live in Python scripts called modules, which in turn live in packages, which are collections of modules. This hierarchy helps to keep things organized.
As per their definition, packages can contain many modules, and often we only need to access one or two of those modules. There are two ways to import specific modules: dot notation and from. This is how we import the pyplot module from matplotlib using dot notation:
# Import using the dot notation and add an alias
import matplotlib.pyplot as plt
and with from:
# Use the from keyword to import a module from the matplotlib package
from matplotlib import pyplot as plt
Both codes achieve the same results. Let us use the pyplot module from the matplotlib library:
# Import the pyplot module from matplotlib
import matplotlib.pyplot as plt
year = [1950, 1970, 1990, 2010]
population = [2.5, 3.6, 5.2, 6.9]
# Use the plt functions to generate a plot
plt.plot(year, population, "o-")
plt.title("Global puplation")
plt.xlabel("Year")
plt.ylabel("Population in thousands of millions \n(or North American billions)")
plt.show()
Now you can see why it is handy to use short aliases! Otherwise we would had typed matplotlib.pyplot five times.
Packages, modules, and functions are accompanied by documentation that describe what they do, what type of data they use, and, of course, how to use them. Documentation can be accessed via the help() function:
# Find out more about the matplotlib package
help(matplotlib)
Help on package matplotlib:
NAME
matplotlib - An object-oriented plotting library.
DESCRIPTION
A procedural interface is provided by the companion pyplot module,
which may be imported directly, e.g.::
import matplotlib.pyplot as plt
or using ipython::
ipython
at your terminal, followed by::
In [1]: %matplotlib
In [2]: import matplotlib.pyplot as plt
at the ipython shell prompt.
For the most part, direct use of the object-oriented library is encouraged when
programming; pyplot is primarily for working interactively. The exceptions are
the pyplot functions `.pyplot.figure`, `.pyplot.subplot`, `.pyplot.subplots`,
and `.pyplot.savefig`, which can greatly simplify scripting.
Modules include:
:mod:`matplotlib.axes`
The `~.axes.Axes` class. Most pyplot functions are wrappers for
`~.axes.Axes` methods. The axes module is the highest level of OO
access to the library.
:mod:`matplotlib.figure`
The `.Figure` class.
:mod:`matplotlib.artist`
The `.Artist` base class for all classes that draw things.
:mod:`matplotlib.lines`
The `.Line2D` class for drawing lines and markers.
:mod:`matplotlib.patches`
Classes for drawing polygons.
:mod:`matplotlib.text`
The `.Text` and `.Annotation` classes.
:mod:`matplotlib.image`
The `.AxesImage` and `.FigureImage` classes.
:mod:`matplotlib.collections`
Classes for efficient drawing of groups of lines or polygons.
:mod:`matplotlib.colors`
Color specifications and making colormaps.
:mod:`matplotlib.cm`
Colormaps, and the `.ScalarMappable` mixin class for providing color
mapping functionality to other classes.
:mod:`matplotlib.ticker`
Calculation of tick mark locations and formatting of tick labels.
:mod:`matplotlib.backends`
A subpackage with modules for various GUI libraries and output formats.
The base matplotlib namespace includes:
`~matplotlib.rcParams`
Default configuration settings; their defaults may be overridden using
a :file:`matplotlibrc` file.
`~matplotlib.use`
Setting the Matplotlib backend. This should be called before any
figure is created, because it is not possible to switch between
different GUI backends after that.
Matplotlib was initially written by John D. Hunter (1968-2012) and is now
developed and maintained by a host of others.
Occasionally the internal documentation (python docstrings) will refer
to MATLAB®, a registered trademark of The MathWorks, Inc.
PACKAGE CONTENTS
_animation_data
_cm
_cm_listed
_color_data
_constrained_layout
_contour
_image
_internal_utils
_layoutbox
_mathtext_data
_path
_pylab_helpers
_qhull
_text_layout
_tri
_ttconv
_version
afm
animation
artist
axes (package)
axis
backend_bases
backend_managers
backend_tools
backends (package)
bezier
blocking_input
category
cbook (package)
cm
collections
colorbar
colors
compat (package)
container
contour
dates
docstring
dviread
figure
font_manager
fontconfig_pattern
ft2font
gridspec
hatch
image
legend
legend_handler
lines
markers
mathtext
mlab
offsetbox
patches
path
patheffects
projections (package)
pylab
pyplot
quiver
rcsetup
sankey
scale
sphinxext (package)
spines
stackplot
streamplot
style (package)
table
testing (package)
tests (package)
texmanager
text
textpath
ticker
tight_bbox
tight_layout
transforms
tri (package)
ttconv
type1font
units
widgets
CLASSES
builtins.FileNotFoundError(builtins.OSError)
ExecutableNotFoundError
builtins.dict(builtins.object)
RcParams(collections.abc.MutableMapping, builtins.dict)
collections.abc.MutableMapping(collections.abc.Mapping)
RcParams(collections.abc.MutableMapping, builtins.dict)
class ExecutableNotFoundError(builtins.FileNotFoundError)
| Error raised when an executable that Matplotlib optionally
| depends on can't be found.
|
| Method resolution order:
| ExecutableNotFoundError
| builtins.FileNotFoundError
| builtins.OSError
| builtins.Exception
| builtins.BaseException
| builtins.object
|
| Data descriptors defined here:
|
| __weakref__
| list of weak references to the object (if defined)
|
| ----------------------------------------------------------------------
| Methods inherited from builtins.FileNotFoundError:
|
| __init__(self, /, *args, **kwargs)
| Initialize self. See help(type(self)) for accurate signature.
|
| ----------------------------------------------------------------------
| Methods inherited from builtins.OSError:
|
| __reduce__(...)
| Helper for pickle.
|
| __str__(self, /)
| Return str(self).
|
| ----------------------------------------------------------------------
| Static methods inherited from builtins.OSError:
|
| __new__(*args, **kwargs) from builtins.type
| Create and return a new object. See help(type) for accurate signature.
|
| ----------------------------------------------------------------------
| Data descriptors inherited from builtins.OSError:
|
| characters_written
|
| errno
| POSIX exception code
|
| filename
| exception filename
|
| filename2
| second exception filename
|
| strerror
| exception strerror
|
| winerror
| Win32 exception code
|
| ----------------------------------------------------------------------
| Methods inherited from builtins.BaseException:
|
| __delattr__(self, name, /)
| Implement delattr(self, name).
|
| __getattribute__(self, name, /)
| Return getattr(self, name).
|
| __repr__(self, /)
| Return repr(self).
|
| __setattr__(self, name, value, /)
| Implement setattr(self, name, value).
|
| __setstate__(...)
|
| with_traceback(...)
| Exception.with_traceback(tb) --
| set self.__traceback__ to tb and return self.
|
| ----------------------------------------------------------------------
| Data descriptors inherited from builtins.BaseException:
|
| __cause__
| exception cause
|
| __context__
| exception context
|
| __dict__
|
| __suppress_context__
|
| __traceback__
|
| args
class RcParams(collections.abc.MutableMapping, builtins.dict)
| RcParams(*args, **kwargs)
|
| A dictionary object including validation.
|
| Validating functions are defined and associated with rc parameters in
| :mod:`matplotlib.rcsetup`.
|
| See Also
| --------
| :ref:`customizing-with-matplotlibrc-files`
|
| Method resolution order:
| RcParams
| collections.abc.MutableMapping
| collections.abc.Mapping
| collections.abc.Collection
| collections.abc.Sized
| collections.abc.Iterable
| collections.abc.Container
| builtins.dict
| builtins.object
|
| Methods defined here:
|
| __getitem__(self, key)
| x.__getitem__(y) <==> x[y]
|
| __init__(self, *args, **kwargs)
| Initialize self. See help(type(self)) for accurate signature.
|
| __iter__(self)
| Yield sorted list of keys.
|
| __len__(self)
| Return len(self).
|
| __repr__(self)
| Return repr(self).
|
| __setitem__(self, key, val)
| Set self[key] to value.
|
| __str__(self)
| Return str(self).
|
| copy(self)
| D.copy() -> a shallow copy of D
|
| find_all(self, pattern)
| Return the subset of this RcParams dictionary whose keys match,
| using :func:`re.search`, the given ``pattern``.
|
| .. note::
|
| Changes to the returned dictionary are *not* propagated to
| the parent RcParams dictionary.
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| ----------------------------------------------------------------------
| Data and other attributes defined here:
|
| __abstractmethods__ = frozenset({'__delitem__'})
|
| validate = {'_internal.classic_mode': <function validate_bool>, 'agg.p...
|
| ----------------------------------------------------------------------
| Methods inherited from collections.abc.MutableMapping:
|
| __delitem__(self, key)
|
| clear(self)
| D.clear() -> None. Remove all items from D.
|
| pop(self, key, default=<object object at 0x0000024BBB1C4140>)
| D.pop(k[,d]) -> v, remove specified key and return the corresponding value.
| If key is not found, d is returned if given, otherwise KeyError is raised.
|
| popitem(self)
| D.popitem() -> (k, v), remove and return some (key, value) pair
| as a 2-tuple; but raise KeyError if D is empty.
|
| setdefault(self, key, default=None)
| D.setdefault(k[,d]) -> D.get(k,d), also set D[k]=d if k not in D
|
| update(self, other=(), /, **kwds)
| D.update([E, ]**F) -> None. Update D from mapping/iterable E and F.
| If E present and has a .keys() method, does: for k in E: D[k] = E[k]
| If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v
| In either case, this is followed by: for k, v in F.items(): D[k] = v
|
| ----------------------------------------------------------------------
| Methods inherited from collections.abc.Mapping:
|
| __contains__(self, key)
|
| __eq__(self, other)
| Return self==value.
|
| get(self, key, default=None)
| D.get(k[,d]) -> D[k] if k in D, else d. d defaults to None.
|
| items(self)
| D.items() -> a set-like object providing a view on D's items
|
| keys(self)
| D.keys() -> a set-like object providing a view on D's keys
|
| values(self)
| D.values() -> an object providing a view on D's values
|
| ----------------------------------------------------------------------
| Data and other attributes inherited from collections.abc.Mapping:
|
| __hash__ = None
|
| __reversed__ = None
|
| ----------------------------------------------------------------------
| Class methods inherited from collections.abc.Collection:
|
| __subclasshook__(C) from abc.ABCMeta
| Abstract classes can override this to customize issubclass().
|
| This is invoked early on by abc.ABCMeta.__subclasscheck__().
| It should return True, False or NotImplemented. If it returns
| NotImplemented, the normal algorithm is used. Otherwise, it
| overrides the normal algorithm (and the outcome is cached).
|
| ----------------------------------------------------------------------
| Methods inherited from builtins.dict:
|
| __ge__(self, value, /)
| Return self>=value.
|
| __getattribute__(self, name, /)
| Return getattr(self, name).
|
| __gt__(self, value, /)
| Return self>value.
|
| __le__(self, value, /)
| Return self<=value.
|
| __lt__(self, value, /)
| Return self<value.
|
| __ne__(self, value, /)
| Return self!=value.
|
| __sizeof__(...)
| D.__sizeof__() -> size of D in memory, in bytes
|
| ----------------------------------------------------------------------
| Class methods inherited from builtins.dict:
|
| fromkeys(iterable, value=None, /) from abc.ABCMeta
| Create a new dictionary with keys from iterable and values set to value.
|
| ----------------------------------------------------------------------
| Static methods inherited from builtins.dict:
|
| __new__(*args, **kwargs) from builtins.type
| Create and return a new object. See help(type) for accurate signature.
FUNCTIONS
checkdep_ps_distiller(s)
[*Deprecated*]
Notes
-----
.. deprecated:: 3.2
\
checkdep_usetex(s)
compare_versions(a, b)
[*Deprecated*] Return whether version *a* is greater than or equal to version *b*.
Notes
-----
.. deprecated:: 3.2
get_backend()
Return the name of the current backend.
See Also
--------
matplotlib.use
get_cachedir()
Return the string path of the cache directory.
The procedure used to find the directory is the same as for
_get_config_dir, except using ``$XDG_CACHE_HOME``/``$HOME/.cache`` instead.
get_configdir()
Return the string path of the the configuration directory.
The directory is chosen as follows:
1. If the MPLCONFIGDIR environment variable is supplied, choose that.
2. On Linux, follow the XDG specification and look first in
``$XDG_CONFIG_HOME``, if defined, or ``$HOME/.config``. On other
platforms, choose ``$HOME/.matplotlib``.
3. If the chosen directory exists and is writable, use that as the
configuration directory.
4. Else, create a temporary directory, and use it as the configuration
directory.
get_data_path(*, _from_rc=None)
Return the path to Matplotlib data.
get_home()
[*Deprecated*] Return the user's home directory.
If the user's home directory cannot be found, return None.
Notes
-----
.. deprecated:: 3.2
interactive(b)
Set whether to redraw after every plotting command (e.g. `.pyplot.xlabel`).
is_interactive()
Return whether to redraw after every plotting command.
is_url(filename)
Return True if string is an http, ftp, or file URL path.
matplotlib_fname()
Get the location of the config file.
The file location is determined in the following order
- ``$PWD/matplotlibrc``
- ``$MATPLOTLIBRC`` if it is not a directory
- ``$MATPLOTLIBRC/matplotlibrc``
- ``$MPLCONFIGDIR/matplotlibrc``
- On Linux,
- ``$XDG_CONFIG_HOME/matplotlib/matplotlibrc`` (if ``$XDG_CONFIG_HOME``
is defined)
- or ``$HOME/.config/matplotlib/matplotlibrc`` (if ``$XDG_CONFIG_HOME``
is not defined)
- On other platforms,
- ``$HOME/.matplotlib/matplotlibrc`` if ``$HOME`` is defined
- Lastly, it looks in ``$MATPLOTLIBDATA/matplotlibrc``, which should always
exist.
rc(group, **kwargs)
Set the current `.rcParams`. *group* is the grouping for the rc, e.g.,
for ``lines.linewidth`` the group is ``lines``, for
``axes.facecolor``, the group is ``axes``, and so on. Group may
also be a list or tuple of group names, e.g., (*xtick*, *ytick*).
*kwargs* is a dictionary attribute name/value pairs, e.g.,::
rc('lines', linewidth=2, color='r')
sets the current `.rcParams` and is equivalent to::
rcParams['lines.linewidth'] = 2
rcParams['lines.color'] = 'r'
The following aliases are available to save typing for interactive users:
===== =================
Alias Property
===== =================
'lw' 'linewidth'
'ls' 'linestyle'
'c' 'color'
'fc' 'facecolor'
'ec' 'edgecolor'
'mew' 'markeredgewidth'
'aa' 'antialiased'
===== =================
Thus you could abbreviate the above call as::
rc('lines', lw=2, c='r')
Note you can use python's kwargs dictionary facility to store
dictionaries of default parameters. e.g., you can customize the
font rc as follows::
font = {'family' : 'monospace',
'weight' : 'bold',
'size' : 'larger'}
rc('font', **font) # pass in the font dict as kwargs
This enables you to easily switch between several configurations. Use
``matplotlib.style.use('default')`` or :func:`~matplotlib.rcdefaults` to
restore the default `.rcParams` after changes.
Notes
-----
Similar functionality is available by using the normal dict interface, i.e.
``rcParams.update({"lines.linewidth": 2, ...})`` (but ``rcParams.update``
does not support abbreviations or grouping).
rc_context(rc=None, fname=None)
Return a context manager for temporarily changing rcParams.
Parameters
----------
rc : dict
The rcParams to temporarily set.
fname : str or path-like
A file with Matplotlib rc settings. If both *fname* and *rc* are given,
settings from *rc* take precedence.
See Also
--------
:ref:`customizing-with-matplotlibrc-files`
Examples
--------
Passing explicit values via a dict::
with mpl.rc_context({'interactive': False}):
fig, ax = plt.subplots()
ax.plot(range(3), range(3))
fig.savefig('example.png')
plt.close(fig)
Loading settings from a file::
with mpl.rc_context(fname='print.rc'):
plt.plot(x, y) # uses 'print.rc'
rc_file(fname, *, use_default_template=True)
Update `.rcParams` from file.
Style-blacklisted `.rcParams` (defined in
`matplotlib.style.core.STYLE_BLACKLIST`) are not updated.
Parameters
----------
fname : str or path-like
A file with Matplotlib rc settings.
use_default_template : bool
If True, initialize with default parameters before updating with those
in the given file. If False, the current configuration persists
and only the parameters specified in the file are updated.
rc_file_defaults()
Restore the `.rcParams` from the original rc file loaded by Matplotlib.
Style-blacklisted `.rcParams` (defined in
`matplotlib.style.core.STYLE_BLACKLIST`) are not updated.
rc_params(fail_on_error=False)
Construct a `RcParams` instance from the default Matplotlib rc file.
rc_params_from_file(fname, fail_on_error=False, use_default_template=True)
Construct a `RcParams` from file *fname*.
Parameters
----------
fname : str or path-like
A file with Matplotlib rc settings.
fail_on_error : bool
If True, raise an error when the parser fails to convert a parameter.
use_default_template : bool
If True, initialize with default parameters before updating with those
in the given file. If False, the configuration class only contains the
parameters specified in the file. (Useful for updating dicts.)
rcdefaults()
Restore the `.rcParams` from Matplotlib's internal default style.
Style-blacklisted `.rcParams` (defined in
`matplotlib.style.core.STYLE_BLACKLIST`) are not updated.
See Also
--------
matplotlib.rc_file_defaults
Restore the `.rcParams` from the rc file originally loaded by
Matplotlib.
matplotlib.style.use
Use a specific style file. Call ``style.use('default')`` to restore
the default style.
set_loglevel(level)
Set Matplotlib's root logger and root logger handler level, creating
the handler if it does not exist yet.
Typically, one should call ``set_loglevel("info")`` or
``set_loglevel("debug")`` to get additional debugging information.
Parameters
----------
level : {"notset", "debug", "info", "warning", "error", "critical"}
The log level of the handler.
Notes
-----
The first time this function is called, an additional handler is attached
to Matplotlib's root handler; this handler is reused every time and this
function simply manipulates the logger and handler's level.
test(verbosity=None, coverage=False, switch_backend_warn=<deprecated parameter>, recursionlimit=<deprecated parameter>, **kwargs)
Run the matplotlib test suite.
use(backend, *, force=True)
Select the backend used for rendering and GUI integration.
Parameters
----------
backend : str
The backend to switch to. This can either be one of the standard
backend names, which are case-insensitive:
- interactive backends:
GTK3Agg, GTK3Cairo, MacOSX, nbAgg,
Qt4Agg, Qt4Cairo, Qt5Agg, Qt5Cairo,
TkAgg, TkCairo, WebAgg, WX, WXAgg, WXCairo
- non-interactive backends:
agg, cairo, pdf, pgf, ps, svg, template
or a string of the form: ``module://my.module.name``.
force : bool, default: True
If True (the default), raise an `ImportError` if the backend cannot be
set up (either because it fails to import, or because an incompatible
GUI interactive framework is already running); if False, ignore the
failure.
See Also
--------
:ref:`backends`
matplotlib.get_backend
DATA
URL_REGEX = re.compile('^http://|^https://|^ftp://|^file:')
__bibtex__ = '@Article{Hunter:2007,\n Author = {Hunter, J. ...ishe...
defaultParams = {'_internal.classic_mode': [False, <function validate_...
default_test_modules = ['matplotlib.tests', 'mpl_toolkits.tests']
rcParams = RcParams({'_internal.classic_mode': False,
...nor.widt...
rcParamsDefault = RcParams({'_internal.classic_mode': False,
...n...
rcParamsOrig = RcParams({'_internal.classic_mode': False,
...nor....
VERSION
3.3.4
FILE
c:\users\mokca\anaconda3\lib\site-packages\matplotlib\__init__.py
It contains a lot of information and very specialized jargon that will become easier to understand as you get more exposure to Python and programming in general. In addition to these very detailed documents, more beginner-friendly options are available on the internet in the form of "user guides" and "tutorials". If you are new to Python, and more so to programming, I would encourage you to start with user guides and tutorials; they are easier/quicker to digest and apply. Documentation will be available for consultation if at any point you need a deeper understanding of a function, module, or package.
Now that we know a little more about libraries, packages and functions, we're going to update our Jupyter notebooks by accessing the Python core via IPython.core.interactiveshell. From there we'll change a value to alter the behaviour of how expression output is displayed with the ast_node_interactivity option.
The default value is last_expr but we can also choose from: all, last, none, and last_expr_or_assign
# Access options from the iPython core
from IPython.core.interactiveshell import InteractiveShell
# Change the value of ast_node_interactivity
InteractiveShell.ast_node_interactivity = "all"
As mentioned before, Python is an object-oriented programming language, thus everything in Python is an object. Among those objects we find so-called data types, which are the basic units with which we work and operate. In this section we will cover three of them: Numbers, strings, and booleans.
Well, this is very self-explanatory. As basic as it sounds, this data type is just the numbers you already know. However, there are two main categories for numbers: integers and floats. Numbers like 5, 90, and 1,400, are called integers (int). The other type, floats, consist of numbers with decimal places, such as 1.5, 45.11, and 3,000.5. The default behaviour in Python is to use periods (.) to indicate decimal places. In some countries, however, real-world decimal places are indicated by commas (,), so be aware of your surroundings!
Here are some examples of number types mixed with variables - a concept introduced earlier in section 2.2.0:
# Integers
variable_a = 5
variable_b = 90
variable_c = 14000
# Use tab to autocompletion. It saves you from making typos and, even better, saves you time hunting down those typos!
# print(variable_a, variable_b, variable_c)
variable_a
variable_b
var
Make sure that you DO NOT add commas to indicate place-holders (ie thousands, millions, etc.), otherwise you will get a completely different result. More importantly, you do not get any errors or warnings! See the example below:
# See what happens when we use comma separators in our numbers!
variable_d = ...
variable_d # variable_d is a variable with two elements because of the comma
type(...) # it is assigned as a "tuple" (more on that later)
What just happened?
# Floats
variable_1 = ...
variable_2 = ...
variable_3 = ...
variable_1
variable_2
variable_3
If you are not sure or forgot what is a variable's data type, just use the type() function and pass the variable as an argument.
# Check the typing of some of our variables
type(...)
type(...)
Now that we have some variables, we can use some mathematical operations and see what we get.
# Basic operations
variable_a + ... # Addition
variable_1 * ... # Multiplication
variable_c / ... # Division
variable_2 - ... # Substraction
** operator to calculate exponentiation¶In some other languages you may use the ^ to represent exponents but Python can parse through the double-asterisk ** to perform exponentiation.
# Exponentiation
variable_2 ...
sqrt() function to determine square roots¶A common mathematical operation you may also wish to perform is determining the nth roots of a value. To accomplish this use the sqrt() function.
# Squared root
...(variable_1 * variable_a)
Why did we get an error and what does it mean? Look it up on the internet. What did you look for and what did you find?
# I you can't find any information about a function, what does this likely mean?
help(sqrt)
# Squared root calulation requires the math package!
...
sqrt(variable_1 * variable_a)
If math is already imported, why is it still not working?
# Remember the dot notation:
variable_root = ...(variable_1 * variable_a)
variable_root
That is better. Now, that number is awfully long. Round it up to two digits:
round(...)
# look up how to calculate the remainder in Python
That is enough for numbers for now. Let us move to other data types
Strings are just text or, more technically speaking, characters. They are called strings simply because they resemble strings. Things like words, names, and also numbers, can be strings if coded properly.
Really anything wrapped in "quotes" can be interpreted as a string.
# Generate your first strings
string_1 = ...
string_1
string_2 = ...
string_2
Strings can also be made up of numbers, as long as the numbers are wrapped between single or double quotes:
string_3 = ...
string_3
# What is the type of this variable?
type(...)
+ operator¶While the + operator may have a specific function when dealing with numbers, its function is actully context-based. When running a command, Python will interpret certain operators based on the objects being provided. In the case of strings, Python will recognize that strings are present on both sides of the operator and concatenate them together.
... # Notice the space in between the quotes
Every string can be considered as a line of characters where each is assigned a position. These positions are called the index and in Python we start with 0 as the first position. This is known as "zero indexing". For the novice and seasoned programmer this can lead to some headaches especially when switching between languages!
We can subset strings using indexation, which simply means calling the elements present based on their numerical coordinate in our variable. In Python, indexation starts at 0, a concept know as "zero indexing".
Knowing this, we can subset string using their index. This train of thought will also apply later to more complex data structures!
For example, the letter "M" in string_1 is at the index 0. Spaces, denoted by a space within quotes (" "), are also accounted for when indexing. Here are the indices for all the elements in string_1: M=0, y=1, " "=2, f=3 , i=4, r=5, s=6, t=7, " "=8, s=9, t=10, r=11, i=12, n=13, g=14, !=15.
To access specific elements of a string, we use the index operator denoted by a position within a set of squared brackets [<index>]:
# access the first element of string_1
...
Accessing elements from a string is that simple! You can also access a range of those elements by using a colon : in between two indices. This is known as slicing and we can use the short [start:stop] or full notation [start:stop:step] which we will cover more of in lecture 6!
Now, we are interested in accessing only the word "first" from string_1, which ranges from index 3 to 7. How would you do it?
string_1[...]
# watch out for 0 indexation and inclusive:exclusive ranges!!!!!!!!!!!!!!!
string_1[...]
The elements of a string can also be accessed from the end by using negative integers, either to access single values or ranges. Note, however, that there is no zero-indexation in reverse so the last character in a string is at -1.
# In reverse order, there is no 0 indexation!!!!!!!!!
string_1[...]
# But our slicing rules still apply!
string_1[...]
Python allows us to short-hand our slicing notation as well by using default values/behaviours in unused arguments. This can also be called a half-open interval and could be a reason why slicing follows the [inclusive:exclusive] behaviour.
# Retrieve the first 5 elements of string_1
string_1[...]
string_1[...]
# Retrieve everything after the 5th element
string_1[...]
# Not the same!
string_1[...]
# Retrieve the last 5 elements of string_1
string_1[...]
# Not the same!
string_1[...]
# What does this return?
string_1[...]
Time for booleans!
Booleans are binary categories such as False or True, yes or no, 1 or 2 (ordinal). Despite their simplicity, booleans are very powerful when used in functions as they can make decisions for you: If the output of a command is True, do x, if False, do y.
Booleans go hand-in-hand with what are called "conditional" arguments. Here are some examples:
5 ... 4 # Exactly equal. == is not assignment but assessing if the two sides are exactly the same
5 ... 4 # Different than
5 ... 4 # Greater than
5 ... 4 # Lesser than
5 ... 4 # Greater or equal than
5 ... 4 # Lesser or equal than
Sometimes we wish to change one data type to another. In the process of certain commands, Python may automatically complete these type conversions, which is also known as coercion. When we explicitly coerce a change from one data type to the next, it is known as casting. You can cast between certain data types and also object types.
For example, data types can be converted explicitly using functions such str() and int().
# Define an integer as a string
string_4 = ...
type(string_4)
# Cast that string into an actual integer
type(...)
If you are performing operations with + for instance, this kind of operator is interpreted to work with single data types rather than mixing them in the same command. Python doesn't really know your intentions as a programmer so you need to correctly guide it to make the right operation.
For example, converting strings into numbers allows us to operate with them mathematically.
string_d = ...
b = 10
c = ...
print("The value of c is", c)
Therefore type conversion must be done before proceeding with the operation, otherwise errors will come up:
string_d = "50"
b = 10
c = ... # Python will not coerce this for you!
print("The value of c is", c)
Let's take a quick trip under the Python hood. If we swap the order of our expression above, will the error be the same? In Python, binary operations are evaluated from left to right first even though the outcome of our code is still an error.
string_d = "50"
b = 10
c = ... # Python will not coerce this for you!
print("The value of c is", c)
Similarly, booleans can also be converted into integers, where False is 0 (zero). Any number other than 0 is considered True (positive and negative).
int(True) # True is 1
int(False) # 0 is false
bool(0) # 0 is equivalent to boolean False
bool(1) # 1 or any other number different than 0 are True
bool(...)
bool(...)
# Calculate the sum of True, False, b (variable), variable_1, and string_d
True + ...
| We will, however, certainly try our best with the remainder of the course! |
That's our first class on Python! You've made it through and we've learned about the following:
At the end of this lecture a Quercus assignment portal will be available to submit your completed skeletons from today (including the comprehension question answers!). These will be due one week later, before the next lecture. Each lecture skeleton is worth 2% of your final grade but a bonus 0.7% will also be awarded for submissions made within 24 hours from the end of lecture (ie 1700 hours the following day).
Soon after the end of each lecture, a homework assignment will be available for you in DataCamp. Your assignment is to complete chapters 1 (Python Basics, 1050 possible points) and 3 (Functions and Packages, 950 possible points) from the Introduction to Python course. This is a pass-fail assignment, and in order to pass you need to achieve a least 1,500 points (75%) of the total possible points. Note that when you take hints from the DataCamp chapter, it will reduce your total earned points for that chapter.
In order to properly assess your progress on DataCamp, at the end of each chapter, please take a screenshot of the summary. You'll see this under the "Course Outline" menubar seen at the top of the page for each course. It should look something like this:
| A sample screen shot for one of the DataCamp assignments. You'll want to combine yours into single images or PDFs if possible |
Submit the file(s) for the homework to the assignment section of Quercus. This allows us to keep track of your progress while also producing a standardized way for you to check on your assignment "grades" throughout the course.
You will have until 13:59 hours on Thursday, January 20th to submit your assignment (right before the next lecture).
Revision 1.0.0: materials prepared by Oscar Montoya, M.Sc. Bioinformatician, Education and Outreach, CAGEF.
Revision 1.1.0: edited and prepared for CSB1021H S LEC0140, 06-2021 by Calvin Mok, Ph.D. Bioinformatician, Education and Outreach, CAGEF.
Revision 1.2.0: edited and prepared for CSB1021H S LEC0140, 01-2022 by Calvin Mok, Ph.D. Bioinformatician, Education and Outreach, CAGEF.
The easiest and quickest ways of start getting help and debugging your code, is to simply copy/paste your error in search engines such as Google or DuckDuckGo. From those results, you can start narrowing down your search and even start using more specialized jargon that would give you more precise results. Below are some of the most common types of errors that you will encounter, along with recommendations to solve them:
File does not exist: This error rises when your code is looking for a file in your current working directory but the file does not exist there. The os package is your friend to navigate the your computer's file system from within Python. Use os.getcwd() to check where you are working, type os.listdir() or the Files tab in the left pane to check that your file exists there, and os.chdir to change your directory if necessary.
Typos: Python is case sensitive so always check that you've spelled everything right. Get used to use the tab-autocompletion feature when possible.
Open quotes, parentheses, brackets: Jupyter highlights matching brackets, which helps to spot where parentheses or brackets are missing or not closed.
Data type: Use commands like type() to check what type of data you have.
Unexpected answers: To access the help menu, type help(module.function). Make sure packages/modules are imported before using the help() function. Alternatively, use a question mark (?) followed by the module/function: ?module.function
Function not found: Make sure the package name is properly spelled, installed, AND imported.
Cheatsheets: Meet your new best friends: Cheatsheets!
Begginers advice: At this level many people have had and solved your problem. Beginners get frustrated because they get stuck and take hours to solve a problem themselves. Go online and get help if books and help() did not help much.
Finding answers online
Asking a question
Remember: Everyone looks for help online ALL THE TIME. It is very common. Also, with programming there are multiple ways to come up with an answer, even different packages that let you do the same thing in different ways. You will work on refining these aspects of your code as you go along in this course and in your coding career.
For this introductory course we will be teaching and running code for Python through Jupyter notebooks. In this section we will discuss
As of 2021-06-09, The latest version of Anaconda3 runs with Python 3.8.10
Download the OS-appropriate version from here https://www.anaconda.com/products/individual
All versions should come with Python 3.8.10
Windows:
MacOS:
Unix:
To save time, we will update just our packages through the command line using the Anaconda prompt. You'll need to find the menu shortcut to the prompt in order to run these commands. Before class you should update all of your anaconda packages. This will be sure to get you the latest version of Jupyter notebook. Open up the Anaconda prompt and type the following command:
conda update --all
It will ask permission to continue at some point. Say 'yes' to this.
You may find that for some reason or another, you'd like to maintain a specific Python-environment (or other) to work in. Environments in Anaconda work like isolated sandbox versions of Anaconda within Anaconda. When you generate an environment for the first time, it will draw all of its packages and information from the base version of Anaconda - kind of like making a copy. You can also create these in the Anaconda prompt. You can even create new environments based on specific versions or installations of other programs. For instance, we could have tried to make an environment for a separate programming language, R version 4.0.3, with the command
conda create -n my_R_env -c conda-forge/label/main r-base=4.0.3=hddad469_3
This would create a new environment with version 4.0.3 of R but the base version of Anaconda would retain version 3.6.1 of R. A small but helpful detail if you are unsure about newer versions of packages that you'd like to use.
Likewise, you can update and install packages in new environments without affecting or altering your base environment! Again it's helpful if you're upgrading or installing new packages and programs. If you're not sure how it will affect what you already have in place, you can just install them straight into an environment.
For more information: https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#cloning-an-environment
If you are inclined, the Anaconda Navigator can help you make a Python environment separate from the base, but you won't be able to perform the same fancy tricks as in the prompt, like installing new packages directly to a new environment.
Note: You should consider doing this only if you have a good reason to isolate what you're doing in Python from the Anaconda base packages. Perhaps you are installing a new or older version of the kernel to be compatible with a package you really want to use?
The Anaconda navigator is a graphical interface that shows all fo your pre-installed packages and give you access to installing other common programs like RStudio (we'll get to that in a moment).
You will now have a Python environment where you can install specific packages that won't make their way into your Anaconda base.
You will likely find a shortcut to this environment in your (Windows) menu under the Anaconda folder. It will look something like Jupyter Notebook (IntroPy)
The Centre for the Analysis of Genome Evolution and Function (CAGEF) at the University of Toronto offers comprehensive experimental design, research, and analysis services in microbiome and metagenomic studies, genomics, proteomics, and bioinformatics.
From targeted DNA amplicon sequencing to transcriptomes, whole genomes, and metagenomes, from protein identification to post-translational modification, CAGEF has the tools and knowledge to support your research. Our state-of-the-art facility and experienced research staff provide a broad range of services, including both standard analyses and techniques developed by our team. In particular, we have special expertise in microbial, plant, and environmental systems.
For more information about us and the services we offer, please visit https://www.cagef.utoronto.ca/.